Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add split-chain option to rank overlay plots #334

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sims1253
Copy link
Contributor

Figured I'd just start with this. Anyone feel free to use this as part of a larger PR doing this for all the relevant functions. Hope I got the testing right and didn't miss anything.

@sims1253
Copy link
Contributor Author

sims1253 commented Dec 16, 2024

Wondering if it'd be cleaner to have an explicit split_chains function or option for the mcmc_trace_data function, rather than implementing this for each of the plotting functions individually. Lemme know if you have ideas for a cleaner design :)

@jgabry
Copy link
Member

jgabry commented Dec 17, 2024

Thank you for this! I'm super busy the rest of this week unfortunately, so I may not have time to comment for a little while. But I'll come back to this when I can. I really appreciate the contribution!

Wondering if it'd be cleaner to have an explicit split_chains function or option for the mcmc_trace_data function, rather than implementing this for each of the plotting functions individually. Lemme know if you have ideas for a cleaner design :)

Probably a good idea.

@jgabry
Copy link
Member

jgabry commented Dec 17, 2024

Actually, it just occurred to me that there's already posterior::split_chains(). So something like this should already work with bayesplot:

x <- example_mcmc_draws()
x %>% 
  as_draws_array() %>%   # not necessary if already a posterior draws object like draws from cmdstanr
  split_chains() %>% 
  mcmc_rank_overlay()

Although the default color schemes will struggle with 8 chains.

@jgabry
Copy link
Member

jgabry commented Dec 17, 2024

So maybe we should add an example of doing this using posterior::split_chains to the doc instead of changing the bayesplot function? Or alternatively we could just call posterior::split_chains internally to do the chain splitting.

@paul-buerkner
Copy link
Collaborator

paul-buerkner commented Dec 17, 2024

I think splitting chains internally (via posterior::split_chains) and having a split_chains argument (default to FALSE) would be good to make users aware of this option and have it most user friendly, since it is crucial in the context of MCMC convergence checks

@sims1253
Copy link
Contributor Author

Oh nice :D good to know. Agree with adding it as a default False option to keep the interface stable.

@jgabry
Copy link
Member

jgabry commented Dec 18, 2024

I agree with an argument that defaults to FALSE.

I think given the way the bayesplot internals are written (they were written mostly before posterior was developed), in order to use posterior::split_chains inside of bayesplot it may need to be done inside the internal function prepare_mcmc_array() in between these two lines:

pars <- rename_transformed_pars(pars, transformations)
set_mcmc_dimnames(x, pars)

Something like this:

  pars <- rename_transformed_pars(pars, transformations)
  if (split_chains) {
    x <- posterior::split_chains(posterior::as_draws_array(x))
  }
  set_mcmc_dimnames(x, pars)

So a split_chains argument could be passed from mcmc_rank_overlay() to mcmc_trace_data() to prepare_mcmc_array().

@sims1253
Copy link
Contributor Author

I'll try to come up with a new version when I find the time this week :)
What do you think about renaming the chains/legend in the split chain case? I thought it was a nice touch to make it explicit in the output.
While we're at it, I suppose all of the rank-based plots should support this? So I would also add it for the mcmc_rank_hist function.

@sims1253
Copy link
Contributor Author

Small thing I noticed, split_chains requires a draws object. But mcmc_rank_xx worked with raw arrays so far. So the input would have to be cast via as_draws first.

@avehtari
Copy link
Contributor

Do we really need this? Based on the thread on BlueSky, it seems there was a misunderstanding of one of my replies (clearly I'm not being able to write clearly enough about a complex topic in 300 chars). At the moment I don't see need for this, and adding it might just cause confusion.

@sims1253 if you still think this is needed, please add justification to this PR or related issue, and we can discuss further, before you spend more time on this PR.

@sims1253
Copy link
Contributor Author

Hmm
I feel like IF rank plots SHOULD be used with split chains by default, it should be supported by bayesplot. If using split chains is a niche thing that you'd rarely use in practice, then I'd agree that it's not necessary.

@avehtari
Copy link
Contributor

If using split chains is a niche thing that you'd rarely use in practice, then I'd agree that it's not necessary.

There are many things that are commonly used in practice, even they should not be used that commonly (like traceplots). Given the available information at the moment, I think the split-chains in rank plots are useless and likely to be confusing and should not be advertised to avoid people wasting their time. I'm willing to change my mind given new evidence, but without it, I would close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants